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Abstract 

A  single  signal  processing  algorithm  can  be  represented  by  many  different  but 
mathematically  equivalent  formulas.  When  these  formulas  are  implemented  in 
actual  code,  they  often  have  very  different  running  times.  Thus,  an  important 
problem  is  finding  a  formula  that  implements  the  signal  processing  algorithm  as 
efficiently  as  possible.  In  this  paper  we  present  three  major  results  toward  this 
goal:  (1)  Different  but  mathematically  equivalent  formulas  can  be  generated 
automatically  in  a  principled  way,  (2)  Simple  features  describing  formulas  can 
be  used  to  distinguish  formulas  with  significantly  different  running  times,  and 
(3)  A  function  approximator  can  learn  to  accurately  predict  the  running  time 
of  a  formula  given  a  limited  set  of  training  data. 


This  research  was  sponsored  by  the  DARPA  Grant  No.  DABT63- 98- 1-0004. 
The  first  author,  Bryan  Singer,  is  partly  supported  by  a  National  Science  Foundation 
Graduate  Fellowship. 

The  content  of  the  information  in  this  publication  does  not  necessarily  reflect  the 
position  or  the  policy  of  the  Defense  Advanced  Research  Projects  Agency  (DARPA), 
the  National  Science  Foundation  (NSF),  or  the  US  Government,  and  no  official  en¬ 
dorsement  should  be  inferred. 


DISTRIBUTION  STATEMENT  A 

Approved  for  Public  Release 
Distribution  Unlimited 


20000509  117 


DTXO  QUALITY  INSPECTED  2 


Keywords:  machine  learning,  signal  processing,  FFT,  performance  prediction, 
mathematical  algorithms,  application  of  neural  networks,  OPAL 


1  Introduction 


Most,  signal  processing  algorithms  can  be  represented  by  a  matrix  which  when 
multiplied  by  an  input  vector  produces  the  desired  output  vector  [4,  5].  A 
straightforward  implementation  of  the  algorithm  would  be  to  simply  implement 
the  multiplication  of  the  specified  matrix  and  the  input  vector.  However,  these 
matrices  often  have  a  particular  form  that  allows  them  to  be  factored  into  a 
product  of  sparse,  structured  matrices.  These  factorizations  allow  for  faster 
implementations  of  signal  processing  algorithms.  Further,  these  factorizations 
can  be  represented  by  mathematical  formulas  [1]. 

A  single  signal  processing  algorithm  can  be  represented  by  many  different 
but  mathematically  equivalent  formulas.  When  these  formulas  are  implemented 
in  actual  code,  they  often  have  very  different  running  times.  Thus,  an  important 
problem  is  finding  a  formula  that  implements  the  signal  processing  algorithm 
as  efficiently  as  possible  [3]. 

This  paper  presents  our  preliminary  work  towards  this  goal.  In  particular, 
this  paper  contains  three  major  results: 

•  Different  but  mathematically  equivalent  formulas  can  be  generated  auto¬ 
matically  in  a  principled  way. 

•  Simple  features  describing  formulas  can  be  used  to  distinguish  formulas 
with  significantly  different  running  times. 

•  A  function  approximator  can  learn  to  accurately  predict  the  running  time 
of  a  formula  given  a  limited  set  of  training  data. 


2  Formula  Generator 

Given  that  there  are  many  different  formulas  that  represent  a  single  signal  pro¬ 
cessing  algorithm,  an  important  problem  is  determining  all  the  different  formulas 
that  represent  this  algorithm.  That  is,  if  we  want,  to  find  the  fastest  formula 
that  implements  a  particular  algorithm,  then  we  need  to  know  what  the  set.  of 
formulas  that  represent,  the  algorithm  is. 

We  have  written  a  formula  generator  that  takes  a  formula  and  a  set  of 
rewrite  rules  and  produces  all  mathematically  equivalent  formulas  according 
to  the  rewrite  rules.  This  formula  generator  provides  a  principled  method  for 
generating  all  mathematically  equivalent  formulas  of  some  specified  formula. 

2.1  Rewrite  Rules 

A  rewrite  rule  states  how  one  formula  can  be  “rewritten1'  as  a  different  but 
mathematically  equivalent  formula.  Each  rewrite  rule  consists  of:  (1)  a  tem¬ 
plate  formula,  (2)  a  result  formula,  and  (3)  a  set  of  variables.  The  template 
consists  of  a  formula  that  is  to  be  matched  with  the  current  formula  or  a  subex¬ 
pression  of  the  current  formula.  The  result  formula  consists  of  a  formula  that 
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is  mathematically  equivalent  to  the  template,  and  so  the  result  formula  can 
replace  the  template  formula. 

Variables  may  be  used  in  both  the  template  and  the  result  formulas.  Two 
kinds  of  variables  are  possible  —  input  and  computable  variables.  Input  vari¬ 
ables  simply  match  appropriate  portions  of  the  input  formula  and  can  be  used 
to  copy  such  into  the  result  formula,.  Thus,  input  variables  allow  templates  to 
match  many  different  formulas.  For  example,  an  input  variable  could  represent 
the  size  of  a  particular  object,  as  in  the  following  example: 

(RULE  TRANSPOSE- IDENTITY 
(vars  n) 

(template  (transpose  (i  n))) 

(result  (i  n))) 

which  says  that  whenever  we  find  a  transpose  of  the  identity  matrix  (of  any  size 
n),  we  can  replace  it  simply  by  the  identity  matrix  (of  the  same  size  n). 

Computable  variables  allow  values  to  be  computed  from  input  variables  that 
can  be  used  in  the  result  formula.  For  example,  two  computable  variables  could 
be  used  to  capture  a  factorization  of  an  integer.  In  particular,  several  sets  of 
computable  variables  can  be  defined,  and  for  each  set  of  computable  variables  a 
function  must  be  given.  This  function  may  take  as  arguments  any  of  the  input 
variables  or  constants.  The  function  then  produces  a  list  of  sets  of  values  for 
the  computable  variables.  Each  set  of  values  on  this  list  is  used  to  produce  a. 
different  result  formula,  and  thus  a  single  rule  matching  a  single  formula  can 
actually  produce  many  result  formulas. 

As  an  example,  consider  the  rewrite  rule: 

(RULE  COOLEY-TUKEY 
(vars  n) 

(template  (f  n)) 

(c-vars  ((r  s)  (factors  n))) 

(result 

(compose  (compose  (compose  (tensor  (f  r)  (i  s)) 

(t  n  s)) 

(tensor  (i  r)(f  s))) 

(1  n  r)))) 

which  says  that  Fn  can  be  replaced  by  (Fr  ©  I s )  T"  (4  ®  F»)  4"  for  anY  integer 
factorization  rs  of  n  (assuming  the  function  “factors”  is  appropriately  defined). 

2.2  Formula  Search  Space 

The  number  of  formulas  that  can  be  produced  by  our  formula  generator  can 
be  very  large.  When  given  a  formula  and  a  set  of  rewrite  rules,  the  formula 
generator  tries  to  apply  each  of  the  rewrite  rules  to  the  given  formula  and  all 
subexpressions  of  the  formula.  Plus  if  any  of  these  produce  a  resulting  formula, 
then  all  of  the  rewrite  rules  can  be  recursively  tested  on  the  resulting  formula 
and  all  of  its  subexpressions. 
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Currently  our  formula  generator  uses  breadth  first  search.  An  open  research 
question  is  how  to  avoid  producing  an  infinite  set  of  formulas,  most  of  which 
are  useless  (e.g.,  (transpose  (transpose  (transpose  (f  32))))). 


3  Cooley- Tiikey 

A  very  important  signal  processing  algorithm  is  the  Fast  Fourier  Transform 
(FFT)  [5].  One  particularly  useful  factorization  of  the  FFT  is  the  Cooley-Tukey 
which  has  the  form:  Frs  =  (Fr  A  (Ir  O  Fs)  L™ .  The  key  aspect  of  this 

factorization  is  that  it  splits  a  large  FFT,  Frs ,  into  two  smaller  FFT's,  Fr  and  Fs. 
This  can  be  visualized  as  a  tree,  as  in  Figure  1(a).  Likewise,  a  more  complicated 
factorization  such  as 

( [(F-,  O  h)Tf  (h  O  F4)Ls2]  r.,  U)  rf  (Is  <;■  [  (F,  O  /,)  T.f  (/,  O  A)  L\  ] )  if 

can  be  more  compactly  represented  in  the  split  tree  shown  in  Figure  1(b). 


Figure  1:  Split  Trees:  (a)  A  split  tree  for  Frs  and  (b)  A  split  tree  for  F32 


Much  of  the  data  that  we  used  in  the  experiments  that  follow  involved  using 
the  formula  generator  to  produce  all  possible  Cooley-Tukey  expansions  of  a 
particular  sized  Fn .  As  an  example,  F\ o8  has  731  different  formulas  that  are 
produced  through  applications  of  Cooley-Tukev.  These  formulas  were  then  fed 
to  a  rather  good  FFT  package  [2]  to  generate  running  times  for  each  of  the 
formulas. 


4  Relevant  Features  for  Predicting  Running  Time 

Given  that  many  mathematically  equivalent  formulas  have  very  different  run¬ 
ning  times  when  implemented,  an  important  question  to  ask  is  what  about 
these  formulas  determine  their  running  times?  Or,  equivalently,  what,  are  good 
features  for  predicting  a  formula's  running  time? 

To  answer  these  questions,  we  will  begin  by  introducing  several  different 
feature  sets  that  can  be  used  to  describe  Cooley-Tukey  expansions  of  an  FFT. 
After  each  of  these  different  feature  sets  have  been  described,  we  will  then 
compare  them  along  two  different  measures  to  see  how  well  the  features  can 
differentiate  formulas  with  different  running  times. 
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4.1  Feature  Sets 


We  begin  by  introducing  a  simple  set  of  features  to  describe  formulas.  In  par¬ 
ticular,  we  take  advantage  of  the  fact,  that  all  of  these  formulas  are  produced  by 
repeated  applications  of  Cooley- Tukey  to  a  FFT.  Then  we  successively  refine 
these  features  in  different  ways  to  produce  a  class  of  feature  sets. 

4.1.1  Counting  Leaf  F’s 

One  simple  and  yet  important  feature  of  a  Cooley- Tukey  expansion  of  an  FFT 
formula  is  the  number  and  sizes  of  the  actual  FFTs  that  appear  in  the  formula. 
These  are  the  Fn's  that  appear  as  leaves  in  the  split  tree.  Specifically,  we  count 
the  number  of  /Vs,  the  number  of  TVs,  the  number  of  Fg’s,  and  so  on  that 
appear  in  the  formula. 

For  example,  the  split  tree  shown  in  Figure  1(b)  would  have  the  features: 

•  3  FVs 

•  1  F4S 

•  0  Fe' s 

•  0  Fig's 

4.1.2  Counting  All  Fs 

Considering  the  previous  features  and  the  split  tree,  one  modification  of  the 
above  features  would  be  to  count  all  of  the  F's  that  appear  in  all  of  the  nodes 
of  the  split  tree  instead  of  just  those  in  the  leaves.  If  we  ignore  the  root 
node,  this  is  equivalent  to  counting  the  number  of  Fs  of  different  sizes  in 
the  actual  formula.  Recall  that  the  form  of  the  Cooley-Tukey  expansion  is 
Frs  =  (Fr  0  I$)  TJ*  ( Ir  ©  Fs)  Lrrs .  While  the  Fr  and  Fs  maybe  recursively 
expanded  with  the  Cooley-Tukey,  the  Ir  and  Is  are  maintained  and  thus  leave 
a  trace  of  how  the  split  tree  was  built. 

For  example,  the  split  tree  shown  in  Figure  1(b)  would  have  the  features: 

•  3  Jo  s 

•  2  J4’s 

•  1  /s’s 

•  0  lie's 


4.1.3  Counting  Leaf  F*s  and  All  Fs 

For  sufficiently  large  split  trees,  it  is  possible  for  two  different,  formulas  to  have 
the  exact  same  I  counts,  but  to  have  different  leaf  F  counts.  For  example,  see 
Figure  2.  So,  a  simple  refinement  of  the  previous  two  feature  sets  would  be  to 
include  both.  That  is,  we  would  count  the  number  of  iV s,  the  number  of  /Vs, 
the  number  of  TVs,  etc.  in  the  formula. 

For  example,  the  split  tree  shown  in  Figure  1(b)  would  have  the  features: 
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Figure  2:  Two  split  trees  with  the  same  I  counts  but  different  Leaf  F  counts 

•  3  TVs  and  3  Jo  s 

•  1  /Vs  and  2  TVs 

•  0  Fg’s  and  1  Jg's 

•  0  Fig's  and  0  Jig’s 


4.1.4  Counting  Left  and  Right  Leaf  F's 

Consider  again  the  first  set  of  features  that  we  introduced  which  simply  counted 
all  of  the  leaf  F's.  A  different  refinement  of  this  would  be  to  separate  F's  that 
are  right  children  of  their  parents  in  the  tree  and  those  that  are  left  children. 
In  particular,  we  would  count  the  number  of  left  Fo's,  the  number  of  right  Fo\ 
the  number  of  left  F4's,  and  so  on  in  the  formula. 

For  example,  the  split  tree  shown  in  Figure  1(b)  would  have  the  features: 

•  1  Right  Fo' s  and  2  Left  Fo' s 

•  1  Right  F4's  and  0  Left  F4's 

•  0  Right  Fg's  and  0  Left  Fg's 

•  0  Right  Fig's  and  0  Left  Fig's 

4.1.5  Counting  Left  and  Right  Fs 

Combining  the  idea  in  the  previous  subsection  along  with  the  idea  of  counting 
all  the  nodes  in  the  split  tree,  produces  yet  another  set  of  features.  In  particular 
we  count  the  number  of  different  sized  left  and  right  F's  appearing  in  the  tree, 
excluding  the  root  node.  This  is  equivalent  to  counting  the  number  of  different 
sized  Fs  on  the  right  or  left  side  of  the  tensor  product  within  the  formula  itself. 
For  example,  the  split  tree  shown  in  Figure  1(b)  would  have  the  features: 

•  1  Right  IF s  and  2  Left  IF s 

•  2  Right  J4's  and  0  Left  J4's 

•  0  Right  J8s  and  1  Left  Jg’s 

•  0  Right  Jig's  and  0  Left  Jig's 
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4.1.6  Counting  Left  and  Right  Leaf  F*s  and  All  I’s 

Once  again,  counting  left  and  right  I's  can't  always  distinguish  two  trees  that 
counting  left  and  right  F's  can  distinguish.  Thus,  we  again  combine  the  two  for 
a  large  set  of  features  the  include  all  those  in  the  previous  two  sets. 

For  example,  the  split  tree  shown  in  Figure  1(b)  would  have  the  features: 

•  1  Right  TVs  and  2  Left  FFs  and  1  Right  TVs  and  2  Left  J2' s 

•  1  Right  FV s  and  0  Left  TVs  and  2  Right  1$ s  and  0  Left  I4  s 

•  0  Right  F$'s  and  0  Left  F§’s  and  0  Right.  /§’ s  and  1  Left  J8' s 

•  0  Right  Fig1 s  and  0  Left  Fig's  and  0  Right  Jig's  and  0  Left  Jig’s 

4.2  Evaluating  Features 

4.2.1  Number  of  Partitions 

Because  several  different  formulas  can  have  the  same  set  of  feature  values,  the 
features  can  be  thought  of  as  generating  a  set  of  equivalence  classes  or  partitions. 
Under  a  set  of  features,  formulas  are  indistinguishable  if  they  have  the  same  set 
of  feature  values,  while  formulas  are  distinguishable  if  they  have  different  feature 
values. 

Thus,  a  very  simple  measure  of  the  effectiveness  of  a  set  of  features  is  the 
number  of  partitions  it  creates  for  a  set  of  formulas.  Some  results  are  shown  in 
Table  1.  As  was  discussed  in  Section  3,  we  used  the  automatic  formula  generator 
to  produce  all  Cooley- Tukey  expansions  of  Fig,  F32,  Fg4,  Fi2s,  Fogg,  and  Fg i2. 
The  bottom  line  of  the  table  show  the  number  of  different  formulas  produced. 
The  remaining  lines  show  how  many  different  partitions  or  equivalence  classes 
are  generated  by  the  different  features  for  each  set  of  formulas. 


*16 

F32 

Fg4 

Fi28 

F2gg 

Fgio 

Leaf  F 

5 

7 

11 

15 

22 

30 

All  I 

7 

13 

31 

68 

168 

385 

Leaf  F  and  All  I 

7 

13 

31 

68 

168 

386 

Left /Right  Leaf  F 

11 

23 

44 

81 

142 

241 

Left/Right  All  I 

14 

45 

149 

523 

1832 

6585 

Left./Right  Leaf  F  and  All  I 

15 

49 

170 

617 

2262 

8473 

All  Formulas 

15 

51 

188 

731 

2950 

12235 

Table  1:  Number  of  Partitions  Generated  by  Different  Feature  Sets  for  all  of 
the  Coolev-Tukey  expansions  of  Different  Sized  Fn ’s 

Note  that  for  all  the  sizes  of  Fn,  as  we  move  down  through  successive  re¬ 
finements  the  number  of  partitions  generally  grows.  That  is,  usually  the  feature 
sets  towards  the  bottom  of  the  table  split  the  formulas  more  than  those  towards 
the  top  of  the  table.  The  one  except  to  this  is  the  Left  and  Right  Leaf  F  features 
which  really  is  a  refinement  of  the  Leaf  F  features  instead  of  the  All  I  features. 
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Also  note  that  the  final  feature  set,  the  Left  and  Right.  Leaf  F  and  All  I  features, 
are  able  to  almost,  but  not  quite,  uniquely  identify  all  the  formulas.  However, 
as  the  size  of  Fn  grows,  this  feature  set  is  less  and  less  able  to  uniquely  identify 
formulas. 

4.2.2  Weighted  Average  Relative  Standard  Deviation 

While  being  able  to  partition  a  set  of  formulas  into  a  large  set  of  equivalence 
classes  is  important  ,  ultimately  we  are  only  concerned  that  all  of  the  formulas 
within  a  partition  have  roughly  the  same  running  time.  A  good  set  of  features 
then  are  ones  that  can  separate  formulas  with  significantly  different  running 
times  into  different  partitions  so  that  all  formulas  within  a  single  partition  have 
roughly  the  same  running  time.  As  a  measure  of  this,  we  define  "weighted  aver¬ 
age  relative  standard  deviation  A  For  each  partition  we  calculate  the  standard 
deviation  of  the  running  times  of  all  the  formulas  that  fall  into  that  partition. 
We  then  calculate  the  relative  standard  deviation  for  each  partition  by  dividing 
the  standard  deviation  by  the  mean.  We  then  take  a  weighted  average  over  all 
partitions,  weighting  each  relative  standard  deviation  by  the  number  of  formulas 
in  the  partition.  See  Table  2. 


•  Let  Pf.  be  the  set  of  formulas  in  partition  A\ 

•  Let  tj  be  the  running  time  of  formula  /. 

•  Let  ???/,.  be  the  mean  running  time  of  the  formulas  in  P^.  Then, 

=  p^T  EiepP>- 

•  Let.  07.  be  the  standard  deviation  of  the  running  times  of  the  formulas 

in  Pk.  Then  <rk  =  )2- 

•  Let,  ?*/•  be  the  relative  standard  deviation  of  the  running  times  of  the 
formulas  in  Pf..  Then  ?*/.  — 

n  n  m  /,• 


Then,  the  Weighted  Average  Relative  Standard  Deviation  is 


e,  i^r 


Table  2:  Calculating  Weighted  Average  Relative  Standard  Deviation 

Some  results  are  shown  in  Table  3.  Once  again  all  of  the  formulas  automati¬ 
cally  generated  from  Cool ey-Tu key  expansions  of  various  sized  Fn' s  where  used. 
Each  formula  was  timed  using  the  FFT  package  discussed  in  Section  3.  Not. 
surprisingly,  the  feature  sets  with  the  least,  and  the  most  number  of  partitions 
(“Leaf  F"  and  “Left/Right  Leaf  F  and  All  F  )  have  the  worst  (largest.)  and  best 
(smallest)  weighted  average  relative  standard  deviations,  respectively.  However, 
even  though  the  Left, /Right  Leaf  F  feature  set  had  more  partitions  in  some  cases 
than  the  All  I  feature  set,  it,  consistently  had  a  much  worse  weighted  average 
relative  standard  deviation  —  nearly  as  bad  as  that  for  the  Leaf  F  feature  set. 


This  shows  that  simply  having  more  partitions  does  not  mean  a  feature  set 
better  distinguishes  formulas  with  different  running  times. 


Fu 

F'3'2 

Fey  a 

^128 

F 2  5  6 

^512 

Leaf  F 

3.807% 

5.152%, 

6.631%. 

6.200%, 

7.015%, 

7.166%, 

All  I 

0.905%, 

1.303%, 

1.437%, 

1.744%, 

1.772% 

1.764%, 

Leaf  F  and  All  I 

0.905%, 

1.303%, 

1.437%, 

1.744%, 

1.772% 

1.741%, 

Left, /Right.  Leaf  F 

2.739%, 

4.152%, 

5.955%, 

5.705%, 

6.648% 

6.856%, 

Left/Right.  All  I 

0.290% 

0.382%, 

0.484%, 

0.524%, 

0.580% 

0.629%, 

Left/R.ight  Leaf 

F  and  All  I 

0.000% 

0.123%, 

0.181%. 

0.276% 

0.324%, 

0.380% 

Table  3:  Weighted  Average  Relative  Standard  Deviation  of  Different  Feature 
Sets  for  all  of  the  Cooley- Tukey  expansions  of  Different  Sized  Fn  s 


5  Learning  to  Predict  Running  Times 

Given  that  there  are  many  different  Cooley- Tukey  expansions  of  large  FFT's 
and  that  they  can  have  different  running  times,  we  would  like  to  find  the  one 
with  the  fastest  running  time.  One  simple  approach  would  be  to  use  the  formula 
generator  to  produce  all  of  the  formulas  and  to  time  each  one  on  each  different 
machine  that  we  might  be  interested  in.  Then  the  formula  with  the  fastest  time 
can  be  determined  for  each  machine. 

There  are  two  problems  with  this  approach:  (1)  each  formula  may  take  a 
non-trivial  amount  of  time  to  run,  and  (2)  there  are  a  very  large  number  of 
formulas  that  need  to  be  run.  These  problems  make  the  approach  intractable 
for  FFT's  of  even  fairly  modest  sizes. 

In  this  section,  we  present  an  approach  to  help  solve  the  first  problem.  In 
particular,  our  approach  is  as  follows: 

•  Generate  a  small  set  of  formulas  automatically. 

•  Time  each  of  these  formulas. 

•  Describe  the  formulas  by  a  set  of  appropriate  features. 

•  Use  this  data  to  learn  to  quickly  and  accurately  predict  the  running  times 
of  the  remaining  formulas. 

With  the  features  discussed  in  the  previous  section  and  with  some  training  data 
obtained  by  timing  a  few  formulas,  we  can  use  machine  learning  techniques  to 
produce  a  function  approximator  that  can  quickly  predict  the  running  times  of 
new  formulas.  Note  that  this  still  does  not  solve  the  second  problem  mentioned 
above:  we  still  must  search  through  a  large  space  of  potential  formulas.  However, 
we  can  now  obtain  a  predicted  running  time  much  more  quickly  than  we  could 
have  obtained  an  actual  running  time. 
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Note  that  while  accurately  predicting  a  formula's  running  time  allows  the 
fastest  formula  to  be  determined  through  exhaustive  search  over  all  formulas, 
it  is  actually  more  than  necessary.  In  particular,  accurately  predicting  which  of 
two  formulas  runs  faster  would  also  allow  the  fastest  formula  to  be  determined 
through  exhaustive  search  over  all  formulas.  Thus,  a  learning  algorithm  need  not 
learn  the  exact  running  time  if  it  can  accurately  predict  which  of  two  formulas 
runs  faster. 

5.1  Experimental  Setup 

We  decided  to  learn  to  predict  running  times  for  the  formulas  generated  by 
Cooley-Tukev  expansions  of  There  are  731  such  formulas.  Timings  were 

obtained  in  two  ways:  (1)  actual  timings  through  the  FFT  package  discussed 
in  Section  3,  and  (2)  model  approximations  through  the  cost  model  shown  in 
Table  4. 

Cost ,(/m  Q  A)  =  Cost (AC'Im)  —  m*cost.(A) 

Cost (-4  *  B)  =  Cost  (A)  +  Cos t{B) 
cost  ( Fn )  =  ci  *  n 2  +  b  *  ii  +  c 
Cost(T" )  =  d  *  7?  +  e 
Cost(L?)  =  0 

Used  a  =  b  =  c  =  d  =  e  =  1 


Table  4:  Simple  Cost  Model 

We  used  a  neural  network  as  the  function  approximator.  For  all  of  the  results 
presented,  we  used  25  hidden  units,  a  learning  rate  0.01  and  a  momentum  of 
0.001.  These  parameters  obviously  are  not  highly  tuned  due  to  the  fact  that 
they  were  used  across  several  different  input  feature  sets  (of  varying  number  of 
inputs)  and  across  desired  output  (running  time  or  faster  of  two  formulas). 

5.2  Results 

Results  for  the  model  data  are  shown  in  Table  5  and  results  for  the  real  data 
are  shown  in  Table  6.  These  tables  are  broken  into  several  groups  of  rows, 
with  each  row  corresponding  to  a  particular  set  of  features  that  were  used.  For 
each  of  these  groups,  experiments  were  run  with  different  sized  training  and  test 
sets.  As  a  base  case,  all  731  formulas  were  used  both  in  training  and  test  in 
the  first  row  of  each  group.  The  4  following  rows  in  each  group  correspond  to 
randomly  selecting  a  certain  percentage  of  the  formulas  for  training  with  the 
remaining  formulas  used  as  a  test  set.  In  this  latter  case,  results  are  averaged 
over  4  random  selections  of  training  sets. 

The  column  marked  “  Aver  age  Percent  Error  on  Predicting  Cost”  reports 
the  prediction  error  on  the  test  set.  In  particular,  it  is  calculated  by  dividing 
the  absolute  different  between  predicted  cost  and  actual  cost  by  the  actual  cost, 
and  then  averaging  over  all  formulas  in  the  test  set: 
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•  Let  Ci  be  the  actual  running  time  of  formula,  i. 

•  Let  pi  be  the  predicted  running  time  of  formula  i. 

•  Then  the  average  percent  error  on  predicting  cost  is  — — * 

The  column  marked  “Percent,  Mistakes  on  Predicting  Faster  of  Two”  reports 
the  prediction  error  on  a  random  sampling  of  pairs  of  formulas  in  the  test,  set,. 
In  particular,  the  number  of  samplings  was  100  times  the  number  of  formulas 
in  the  test  set,.  The  percentage  was  calculated  by  taking  the  number  of  pairs  of 
formulas  the  network  predicted  incorrectly  which  ran  faster  and  dividing  it  by 
the  total  number  of  pairs  of  formulas  tested. 

The  “Leaf  F  and  All  I”  and  “Left,  and  Right  Leaf  F  and  All  I”  models  yielded 
the  best  learning  results.  These  results  were  quite  good  with  less  than  12%  error 
on  predicting  the  faster  of  two  formulas  and  less  than  9%  error  on  predicting 
the  running  times.  In  fact,  the  error  was  much  less  than  these  in  most  cases. 

Interestingly,  the  “Left  and  Right,  Leaf  F  and  All  I”  model  tended  to  predict 
better  for  the  larger  training  sets  while  the  “Leaf  F  and  All  I”  model  tended  to 
predict,  better  for  the  smaller  training  sets.  This  can  be  understood  from  the 
fact  that  the  “Left  and  Right  Leaf  F  and  All  I”  model  has  a  lower  weighted 
average  relative  standard  deviation  and  can  thus  better  distinguish  formulas 
with  different  running  times,  but  when  the  training  data  is  small  generalization 
may  not  occur  as  easily  with  this  model. 

The  “Leaf  F”  and  “Left,  and  Right,  Leaf  F”  models  both  perform  significantly 
worse  than  all  of  the  other  models  at  predicting  the  faster  of  two  formulas.  This 
is  not  surprising,  given  that  these  two  models  had  much  larger  weighted  average 
r  el  at  i  ve  st,  an  d  ard  deviations . 


6  Conclusions  and  Future  Work 

Through  the  use  of  rewrite  rules  and  a  formula  generator,  it  is  possible  to 
automatically  generate  different  but,  mathematically  equivalent  formulas  in  a 
principled  way.  These  formulas  can  then  be  described  with  various  sets  of  simple 
features  which  can  reasonably  partition  the  space  of  formulas  into  groups  with 
close  running  times.  Finally,  a  function  approximator  can  learn  to  accurately 
predict  the  running  time  of  a.  formula,  given  a  limited  set  of  training  data. 

We  are  currently  pursuing  several  lines  of  research  that  build  upon  the  work 
presented  in  this  paper,  including: 

•  Determining  how  well  a  function  approximator  can  interpolate  and  ex¬ 
trapolate  to  different  size  Fn' s.  The  results  presented  here  all  were  for 
Cooley-Tukey  expansions  of  Fyos-  If  a  function  approximator  was  pre¬ 
sented  with  Cooley-Tukey  expansions  of  Fy 28  and  F512,  could  it  predict 
well  for  Cooley-Tukey  expansions  of  Fo 56  or  Fyo 24? 

•  Investigating  other  feature  spaces.  The  features  described  in  this  paper 
certainly  are  not  the  only  ones  that  could  be  chosen. 
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•  Investigating  learning  across  machines  and  compilers.  Could  a  function 
approximator  learn  to  predict  running  times  for  particular  machine  or 
compilers,  given  some  appropriate  features  of  the  machine  and  compiler. 

•  Investigating  other  factorizations  of  the  FFT  and  other  signal  processing 
algorithms. 

•  Finding  a  solution  to  t  he  problem  that,  there  are  an  extremely  large  number 
of  possible  formulas  representing  signal  processing  algorithms.  In  particu¬ 
lar,  it  is  not  feasible  to  exhaustively  generate  all  possible  formulas  for  large 
transforms.  Instead,  we  are  developing  heuristic  methods  for  searching  the 
space  of  formulas. 
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Table  5:  Prediction  Accuracy  for  Model  Data 


Table  6:  Prediction  Accuracy  for  Real  Data 


